Skip to content

Use universally cluster-local FQDNs for temporal-setup container#12

Open
spidercensus wants to merge 3 commits into
mainfrom
jp/fix-temporal-setup
Open

Use universally cluster-local FQDNs for temporal-setup container#12
spidercensus wants to merge 3 commits into
mainfrom
jp/fix-temporal-setup

Conversation

@spidercensus
Copy link
Copy Markdown
Collaborator

@spidercensus spidercensus commented May 29, 2026

Description

Summary

Updates all Temporal frontend addresses in the Helm chart from short-form Kubernetes service DNS (..svc) to fully-qualified cluster-local names (..svc.cluster.local).

Problem

Several Temporal components were configured to reach the frontend using the abbreviated .svc hostname. While that often resolves inside a cluster, some environments and tooling expect the full cluster-local FQDN. The temporal-setup init container was especially sensitive to this — it uses nc to wait for the frontend and TEMPORAL_ADDR for namespace/bootstrap commands, so unreliable DNS resolution can block pod startup.

Changes

In deploy/helm/templates/temporal.yaml, append .cluster.local to every Temporal frontend reference:

temporal-setup init container — TEMPORAL_ADDR and the nc -z readiness loop
config-manager-worker — TEMPORAL_ADDRESS
Workflow worker — TEMPORAL_ADDRESS
Scheduler — TEMPORAL_ADDRESS
Gateway — TEMPORAL_ADDRESS
Dev UI — TEMPORAL_ADDRESS

Validation

  • Standard CI passes.
  • Kind integration passes, or this PR explains why it was not run.

The kind integration test is manual due to taking ~30 min to complete. When the PR is ready for review,
run Actions -> Kind Integration -> Run workflow against the copy-pr-bot generated
pull-request/<PR_NUMBER> branch. Use the default test_path for the full suite,
or narrow it only while debugging.

Passing Kind Integration run:

Checklist

  • I am familiar with the contributing guidelines in CONTRIBUTING.md.
  • Commits are signed off for DCO compliance.
  • New or existing tests cover these changes, or the PR explains why tests are not needed.
  • Documentation is updated for user-facing behavior changes.
  • Generated artifacts are updated when applicable, such as OpenAPI specs,
    docs screenshots, or Helm/rendered outputs.

Summary by CodeRabbit

  • Chores
    • Updated Temporal components to use fully qualified frontend domain names (.svc.cluster.local) across backend and UI services; standardized frontend addressing for workers, API, scheduler, archive, and web UI, ensuring consistent DNS resolution and uniform port-based addressing within the cluster.

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 69bc9047-088d-4df0-8a99-877cca77adf4

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce0a1b and 966a705.

📒 Files selected for processing (1)
  • deploy/helm/templates/temporal.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • deploy/helm/templates/temporal.yaml

📝 Walkthrough

Walkthrough

This PR changes Temporal Helm templates to use fully qualified frontend DNS names ending in .svc.cluster.local (including namespace and port) for init wait checks and container TEMPORAL_ADDRESS/TEMPORAL_ADDR environment variables.

Changes

Temporal component FQDN updates

Layer / File(s) Summary
Init/wait checks updated
deploy/helm/templates/temporal.yaml
Init containers and non-frontend services' TCP readiness loops updated to wait on the frontend service using the full .svc.cluster.local FQDN including namespace and port.
Container TEMPORAL_ADDRESS env updates
deploy/helm/templates/temporal.yaml
Worker init and main containers, Temporal API, scheduler, archive, and web UI set TEMPORAL_ADDRESS/TEMPORAL_ADDR to the frontend .svc.cluster.local FQDN plus configured port.

🎯 2 (Simple) | ⏱️ ~10 minutes

🐇 A rabbit hops through cluster rows,

frontends found where FQDN grows,
workers, API, web in line,
all resolve with cluster.local fine,
DNS made whole — the services glow.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating Temporal DNS addressing to use fully qualified cluster-local FQDNs across multiple components in the Helm template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jp/fix-temporal-setup

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deploy/helm/templates/temporal.yaml (1)

305-701: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Cross-file inconsistency: Other templates still use short .svc form.

While this PR correctly updates temporal.yaml to use FQDNs, other Helm templates in the chart still reference the frontend service using the short .svc form:

  • deploy/helm/templates/kubernetes-secrets.yaml (line ~186): Uses {{ $temporalName }}-frontend-service.{{ .Values.global.namespace }}.svc:{{ port }} for the [temporal] grpc_service config
  • deploy/helm/templates/_helpers.tpl (line ~709): The waitForTemporalNamespace helper defines TEMPORAL_ADDR using the short form

For consistent DNS resolution behavior across all components and environments, consider updating these templates in a follow-up change to use the same .svc.cluster.local FQDN form.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/helm/templates/temporal.yaml` around lines 305 - 701, Other templates
still use the short .svc DNS form causing inconsistency; update occurrences in
deploy/helm/templates/kubernetes-secrets.yaml (the [temporal] grpc_service
entry) and the waitForTemporalNamespace helper in
deploy/helm/templates/_helpers.tpl (TEMPORAL_ADDR) to use the same FQDN format
you introduced (i.e. change ".svc:{{ port }}" or ".svc {{ port }}" to
".svc.cluster.local:{{ port }}" or ".svc.cluster.local {{ port }}" as
appropriate), and ensure any env vars or helper-returned addresses (e.g.,
TEMPORAL_ADDR) match the FQDN form so all templates use consistent "{{
$temporalName }}-frontend-service.{{ .Values.global.namespace
}}.svc.cluster.local:{{ .Values.temporal.services.frontend.port }}".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/helm/templates/temporal.yaml`:
- Around line 305-310: The wait-for-frontend init container is using a short
hostname form whereas the temporal-setup container sets TEMPORAL_ADDR and uses
the FQDN; update the wait command in the other init container (the
"wait-for-frontend" init container) so its nc -z invocation uses the same FQDN
and port as TEMPORAL_ADDR (i.e., replace "{{ $temporalName }}-frontend-service"
with "{{ $temporalName }}-frontend-service.{{ .Values.global.namespace
}}.svc.cluster.local" and use "{{ .Values.temporal.services.frontend.port }}"
for the port) so both readiness checks resolve identically across environments.

---

Outside diff comments:
In `@deploy/helm/templates/temporal.yaml`:
- Around line 305-701: Other templates still use the short .svc DNS form causing
inconsistency; update occurrences in
deploy/helm/templates/kubernetes-secrets.yaml (the [temporal] grpc_service
entry) and the waitForTemporalNamespace helper in
deploy/helm/templates/_helpers.tpl (TEMPORAL_ADDR) to use the same FQDN format
you introduced (i.e. change ".svc:{{ port }}" or ".svc {{ port }}" to
".svc.cluster.local:{{ port }}" or ".svc.cluster.local {{ port }}" as
appropriate), and ensure any env vars or helper-returned addresses (e.g.,
TEMPORAL_ADDR) match the FQDN form so all templates use consistent "{{
$temporalName }}-frontend-service.{{ .Values.global.namespace
}}.svc.cluster.local:{{ .Values.temporal.services.frontend.port }}".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a4bf3e32-ee83-46a5-a3a6-33029ca14102

📥 Commits

Reviewing files that changed from the base of the PR and between 2083d0e and de0ba2e.

📒 Files selected for processing (1)
  • deploy/helm/templates/temporal.yaml

Comment thread deploy/helm/templates/temporal.yaml
Signed-off-by: Jason Pack <jpack@nvidia.com>
@spidercensus spidercensus force-pushed the jp/fix-temporal-setup branch from 0184a0b to 3ce0a1b Compare May 29, 2026 20:21
Signed-off-by: Jason Pack <jpack@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant